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A combination of lexical bias and altered auditory feedback was used to investigate the 
influence of higher-order linguistic knowledge on the perceptual aspects of speech motor 
control. Subjects produced monosyllabic real words or pseudo-words containing the vowel 
[e] (as in "head") under conditions of altered auditory feedback involving a decrease in 
vowel first formant (F1) frequency. This manipulation had the effect of making the vowel 
sound more similar to [I] (as in "hid"), affecting the lexical status of produced words in 
two Lexical-Change (LC) groups (either changing them from real words to pseudo-words: 
e.g., less — liss, or pseudo-words to real words: e.g., kess — kiss). Two Non-Lexical-Change 
(NLC) control groups underwent the same auditory feedback manipulation during the 
production of [e] real- or pseudo-words, only without any resulting change in lexical status 
(real words to real words: e.g., mess — miss, or pseudo-words to pseudo-words: e.g., 
ness — niss). The results from the LC groups indicate that auditory-feedback-based speech 
motor learning is sensitive to the lexical status of the stimuli being produced, in that 
speakers tend to keep their acoustic speech outcomes within the auditory-perceptual 
space corresponding to the task-related side of the word/non-word boundary (real words 
or pseudo-words). For the NLC groups, however, no such effect of lexical status is 
observed. 

Keywords: speech production, sensorimotor integration, lexical effect, altered auditory feedback, language 
processing 



INTRODUCTION 

Linguistic processing of the speech acoustic signal is a noto- 
riously challenging phenomenon: speakers and listeners must 
selectively navigate ambiguous auditory scenes in real time and 
filter out irrelevant noise from the ambient environment (Zion 
Golumbic et al, 2012). To achieve this, the nervous system must 
be able to extract and integrate various types of perceptual, motor 
and linguistic information from the incoming speech stream. 
Instances of information integration across the perceptual and 
motor domains have been particularly well documented in both 
speech perception and production (see below), but systematic 
attempts to understand how these various components interact 
with each other as well as with more abstract levels of linguis- 
tic representation have just begun to take shape (Hickok et al., 
2011; Hickok, 2012). The present study aims to contribute to this 
effort by exploring the possible top-down influence of linguis- 
tic information — in this case, the lexical status of words — on the 
sensorimotor networks supporting spoken language production. 

In language perception, it has long been demonstrated that 
linguistic context can determine how speech sounds or words 
are interpreted (Miller et al., 1951; Warren and Sherman, 1974; 
Ganong, 1980). Amongst the most illustrative examples of this is 
Ganong's (1980) lexical effect on phoneme identification, whereby 
subjects are presented with a phonetically ambiguous consonant 
(e.g., between a [d] and a [t] as determined by voice onset time). 



occurring in a lexical context (e.g., "-ash") such that one interpre- 
tation is consistent with a real-word (e.g., "dash") and the other 
maps onto a pseudo-word ("tash"). Ganong's (1980) study and 
later replications indicate that the identification of a target sound 
is biased toward real words (Figure 1), effectively shifting phone- 
mic boundaries in favor of existing lexical entries (Connine and 
Clifton, 1987; Burton et al., 1989; Pitt, 1995). 

These findings have been taken to suggest a direct influence 
of lexical knowledge on the classification of phonetic categories, 
however the way in which phonetic-lexical integration takes place 
and at which stage of the recognition process remains a mat- 
ter of debate (Fox, 1984; Connine and Clifton, 1987; Pitt and 
Samuel, 1993; Pitt, 1995; Myers and Blumstein, 2008). One key 
question is whether this lexical effect emerges from a direct 
influence of lexical entries on the perception of phonetic input 
(Figure 2A), or whether lexical knowledge is used at a later 
stage for the purpose of explicit perceptual decision-making 
(Figure 2B). Methodologically, disentangling this issue has been 
complicated by the fact that the lexical effect has been investi- 
gated through observation of participants' explicit classification 
of auditory stimuli (i.e., phoneme identification tasks). Such overt 
classificatory tasks are not typical of speech perception under nat- 
urally occurring conditions, hence it remains unclear to what 
extent lexical knowledge plays a role in everyday phonetic pro- 
cessing (Fox, 1984). If the lexical effect is primarily related to 
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overt decision-making ability, and not to the perception of speech 
acoustic properties per se, then doing away with any such explicit 
classificatory task would be expected to reduce or eliminate 
the influence of lexical knowledge on speech perception. Recent 
appeal to neuroimaging methods such as fMRI, MEG, and/or 
EEG (Gow et al, 2008; Myers and Blumstein, 2008) has helped 
to address this question. In particular, Gow et al. (2008) were able 



to gain spatially and temporally fine-grained neurophysiological 
evidence supporting a causal influence of brain areas involved 
in lexical knowledge on those supporting first-order phonetic 
processing of ambiguous speech stimuli (similar to those used 
by Ganong, 1980) before areas shown to be involved in percep- 
tual decision-making were recruited (e.g., the left inferior frontal 
gyrus, see Burton et al., 2000). These findings thus support the 
notion of a direct top-down influence of lexical representations 
on speech perception. 

Efficient speech production is also known to result from the 
integration of articulatory motor control with both perceptual 
and linguistic information, but the influence of these cues upon 
speech motor processes has so far been investigated using largely 
non-overlapping approaches (Hickok, 2012 see also Hickok et al, 
2011). On the one hand, numerous studies have emphasized that 
acoustic-phonetic goals guide articulatory control (Houde and 
Jordan, 1998; Tourville et al, 2008; Shiller et al, 2009). A standard 
way to demonstrate this influence is to perturb participants' audi- 
tory feedback as they produce syllables or words (e.g., "head") in 
such a way that a change in acoustic properties (e.g., a decrease 
in the vowel first formant frequency, or Fl) results in the percep- 
tion of a different speech sound (e.g., a vowel closer to [I], as in 
"hid"). The perceived deviation of the produced speech acoustic 
signal from the acoustic target provokes a compensatory change 
in participants' speech output (e.g., an Fl increase), indicating a 
reliance on the phonetic processing of auditory feedback in guid- 
ing and adapting future productions (Houde and Jordan, 1998; 
Shiller et al, 2009). 

On the other hand, psycholinguistic research has pointed to 
the influence of abstract lexical information on speakers' articu- 
latory patterns (Levelt, 1983, 1989), in particular their tendency 
to substitute phonemes more often in words or word strings 
when these substitutions yield real words {barn door — darn bore) 
as opposed to pseudo-words (dart board — *bart doard, Baars 
et al, 1975; Costa et al., 2006; Oppenheim and Dell, 2008). 
Interestingly, this so-called lexical bias echoes Ganong's lexical 
effect in showing that the speech production system is biased 
toward real words relative to pseudo-words. 

It is evident from these separate demonstrations that lexical 
status influences the perception of speech sounds, and that both 
auditory-perceptual and lexical information play a role in the 
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FIGURE 1 I Illustration of the lexical effect (adapted from Ganong, 
1980). The perception of a stop consonant as voiced (e.g.. Id]) or unvoiced 
(e.g., It]) depends primarily upon the duration of the interval between the 
stop release "burst" and the onset of vocalization {voice-onset time, or 
VOT), whereby a short VOT is perceived as voiced and a long VOT is 
perceived as unvoiced. When a continuum of stimuli is presented to 
subjects that involve stop consonants with systematically increasing VOT 
the perception shifts from voiced (Idl) to unvoiced (Itl); shown as a 
decreasing tendency to identify stimuli as voiced. In this example of the 
lexical effect the proportion of perceived voiced responses (represented on 
the )/-axis) for a given VOT stimulus tends to increase (red dashed line) or 
decrease (blue dashed line) as a function of whether the stimuli correspond 
to real words (e.g., "dash" and "toot") or pseudo-words (e.g., "fash" and 
"doot"). 
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FIGURE 2 I Competing accounts of the lexical effect on phoneme perception. (A) Lexical knowledge directly influences phonetic perception; (B) Lexical 
knowledge is used at a later stage for the purpose of explicit decision-making. 
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FIGURE 3 I Schematic illustration of the effects predicted in the 
present study. Vowels are altered by a change in F1 frequency, resulting in 
[e] productions being perceived more like [i] (black arrow at top). The 
resulting compensatory change in speech output (red and blue arrows) has 
the effect of restoring the subject's perception of their vowel to the original 
vowel category |e]. Top Panel: Fl vowel alterations that have the effect of 
changing the lexical status of a real word (e.g., less) to a pseudo-word (e.g., 
liss) are predicted to yield reduced compensatory changes in participants' 
productions (red arrow) as a result of the perceptual lexical effect, whereby 
the boundary for the vowel contrast is shifted in the direction of [i] (red 
dashed line). Bottom panel: When the same Fl vowel alteration has the 
effect of changing the lexical status of a pseudo-word to a real-word, a 
larger compensatory change is expected (blue arrow) due to the perceptual 
boundary shifting in the direction of the vowel [e] (blue dashed line). 



neural control of speech production. What remains unknown 
is the extent to which lexical and phonetic information may 
interact during the control of speech production. We investi- 
gated this question by combining a variant of Ganong's (1980) 
lexical manipulation with a paradigm of sensorimotor adap- 
tation to altered auditory feedback during speech production. 
Specifically, we compared participants' motor adaptation to a per- 
ceived decrease in Fl frequency during production of the vowel 
[e] (resulting in a vowel perceived to be closer to [I]) under 
two different conditions. In the Lexical-Change (LC) condition, 
participants produced a series of [e] real-words (e.g., less) or 
pseudo-words (e.g., kess) that were chosen in such a way that 
the feedback alteration resulted in the perception of stimuli as 
having a different lexical status (i.e., a real-word perceived as a 
pseudo-word, or a pseudo-word perceived as a real-word). On 
the basis of previous demonstrations of the lexical effect (Baars 
et al, 1975; Ganong, 1980), we hypothesized that in partici- 
pants' phonetic perception of their own speech auditory feedback, 
the phonetic boundary would be biased in accordance with lex- 
ical status (with the boundary shifting in the direction of the 
vowel in the non-lexical item, thus enlarging the area along the 
continuum containing the real-word) and that this bias would 
be reflected in their patterns of articulatory compensation (see 
Figures). An interaction between phonetic boundaries and the 
degree of auditory- feedback-based speech compensation has pre- 
viously been demonstrated in a recent study by Niziolek and 
Guenther (2013), who observed that the magnitude of compen- 
sation to real-time formant alterations was larger when perturba- 
tions "push" the perceived sound into the region of the boundary 
between two phonemes (e.g., between [e] and [as]). While that 
study did not involve manipulations of lexical status or changes 
to the phonetic boundary itself (capitalizing, rather, on natu- 
rally occurring variation in formant frequencies among repeated 
productions of the same vowel), it supports the notion that the 
sensory error signal that drives speech motor compensation is 
defined, in whole or in part, by the proximity of acoustic output 
to the phonetic boundary. 

The speech adaptation effects obtained in the LC condition 
are compared to a Non-Lexical-Change (NLC) control condition, 
in which participants once again produced either [e] real words 
or pseudo-words. However, contrary to the LC condition, the 
stimuli in this condition were selected such that the Fl shifts 
did not result in a change in lexical status (i.e., a pseudo-word 
remained a pseudo-word, and a real-word remained a real-word). 
We expected that subjects in the NLC condition would not exhibit 
a difference in speech adaptation depending on whether they 
produced real words or pseudo-words, owing to the lack of any 
perceptual boundary shift and/or change in lexical status. 

MATERIALS AND METHODS 
SUBJECTS 

Forty adult subjects were tested (age range: 18-30 years). All 
were native speakers of English and had no reported history of 
speech, language or hearing disorder. Hearing status was verified 
immediately prior to testing using a pure-tone hearing screen- 
ing (threshold <20 dB HL at octave frequencies between 250 and 
4000 Hz). Subjects provided written informed consent prior to 



testing. All procedures were approved by the Institutional Review 
Board, Faculty of Medicine, McGill. 

STIMULUS WORDS AND GROUP ASSIGNMENT 

Subjects were randomly assigned to one of four groups (10 
subjects in each, 5 males and 5 females), each of which under- 
went an identical series of speech production tasks (see Table 1 
and Procedures below), including the production of monosyllabic 
words (real-words or pseudo-words) under normal-feedback 
conditions (baseline task) followed by the production of words 
under conditions of altered auditory feedback (speech adap- 
tation task). For all four groups, the words produced in the 
speech adaptation task contained the vowel [e] (e.g., "bed" or 
"geek"). During this task, an acoustic manipulation was car- 
ried out in real time such that the vowel [e] was perceived 
to be closer to the vowel [I] (as in "bid"; see Real-time alter- 
ation of speech below). The key difference among the four 
groups relates to the stimuli produced under these conditions: 
for two of the groups (lexical- change), the stimuli were such that 
the acoustic manipulation of the vowel changed the lexical status 
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of the item: one group (LC-1: word to pseudo-word) produced a set 
of real-words that, when altered, resulted in a set of pseudo-words 
(e.g., "test" becoming "list"), and another group {LC-2: pseudo- 
word to word) produced a set of pseudo-words that, when altered, 
results in a set of real-words (e.g., "ked" becoming "kid"). For 
the two other groups {non-lexical change), the stimuli were such 
that the acoustic manipulation of the vowel did not alter their lex- 
ical status: one group {NLC-1: word to word) produced a set of 
real-words that, when altered acoustically, resulted in another set 
of real-words (e.g., "bed" becoming "bid"), and a second group 
(NLC-2: pseudo-word to pseudo-word) produced a set of pseudo- 
words that, when altered acoustically, resulted in a different set of 
pseudo-words (e.g., "geek" becoming "gick"). 

For subjects in the two lexical-change groups (LC-1 and LC-2), 
the stimulus set consisted of 10 pairs of items (see Table 1, left). 
For subjects in the two non-lexical-change groups (NLC-1 and 
NLC-2), the stimulus set consisted of 9 pairs of items (see Table 1 
right). The slight difference in the number of stimuli was due to 
the difficulty in finding words and pseudo-words that met the 
required phonetic and lexical criteria for the NLC control groups. 
For the baseline production task, subjects produced each item 5 
times each in a fully randomized order, yielding a total of 100 
items for the LC groups, and 90 items for the NLC groups. 

The sets of words and pseudo-words used in all of the groups 
(including the target [s] -words and the [I] -words resulting from 



Table 1 | Distribution of groups as a function of condition. 
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the acoustic manipulation) were matched on a number of criteria, 
including neighborhood density (Pisoni and Tash, 1974), assess- 
ing the number of words that are phonologically similar to the 
target words, and bi-phonemic probability, which represents how 
frequently the phoneme pairs in the target words occur together 
in the lexicon (Vitevitch and Luce, 2004; see Table 2). Both of 
these variables have been shown to influence word production 
(Munson and Solomon, 2004; Goldrick and Larson, 2008). A 
Three- Way analyses of variance (AN OVA), including the factors 
Condition (Lexical change vs. Non-lexical change), Word Type 
(Word vs. Pseudo-word) and Vowel ([e] vs. [I]) was carried out 
separately for these two measures. For neighborhood density, 
all main and interaction effects were not significant (p > 0.05). 
A significant difference in bi-phonemic probability was found 
between [e] and [I] words [-F(i_ 59) = 8.22, p < 0.006]: it was 
significantly higher in [I] -words relative to [e] -words (0.006 vs. 

0. 003.. No significant interactions were obtained (allp's > 0.06). 
No reliable difference was observed for this variable between 
the different lexical change conditions or word-types (p > 0.05), 
however, thus the four groups remained matched. 

TESTING PROCEDURES 

Speech was recorded in a quiet testing room using a head- 
mounted microphone (C520, AKG, Germany) and digitized at 
16-bit/44.1 kHz on a PC using custom software written in Matlab 
(Mathworks, MA). Auditory speech signals were presented to 
subjects using circumaural headphones (880 pro, Beyerdynamic, 
Germany). 

All subjects underwent the following sequence of tasks: 

1. Baseline speech production: The first task involved the produc- 
tion of a set of stimuli under conditions of normal auditory 
feedback. Each stimulus was presented orthographically on a 
computer monitor for 1.5 s, followed by a blank screen for 
2 s between items. Subjects were instructed to produce each 
item as soon as it appeared on screen. Prior to beginning, 
subjects practiced producing each stimulus item once and 
their pronunciation was corrected, if necessary, by the experi- 
menter (for all items, subjects were told to produce the vowel 
"e" as in the word "head," and the vowel "i" as in the word 
"hid"). 

2. Test of speech motor adaptation: For subjects in each group, 
the baseline production task was followed by a test of speech 



Table 2 | Control measures for experimental stimuli, including neighborhood density (ND) and word-average bi-phonemic probability (PB). 
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motor adaptation involving 160 productions of words (or 
pseudo-words) containing the vowel [e] from the group's tar- 
get stimulus list. Word order was randomized. Similar to 
prior studies of speech adaptation to altered auditory feedback 
(Purcell and Munhall, 2006; Villacorta et al, 2007; Shiller et al, 
2009; Rochet-Capellan and Ostry, 2011), subjects underwent 
a sequence of auditory feedback conditions involving an ini- 
tial period of normal feedback (30 trials, null phase), followed 
by a period of practice under conditions of altered auditory 
feedback (100 trials, hold phase), and then finally a period 
under normal feedback once again, to test for the presence of 
learning after-effects (30 trials, after-effect phase). The audi- 
tory feedback manipulation corresponded to a 30% decrease 
in the frequency of the first spectral peak (the first formant, 
or Fl), which resulted in a vowel that was perceived to be 
more like the vowel [I] (see Real-time alteration of speech for 
details). 

REAL-TIME ALTERATION OF SPEECH 

The alteration of auditory feedback involved a 30% decrease in 
the first formant (Fl) of the vowel acoustic signal (average shift: 
216.3 Hz). The Fl manipulation was carried out using a system 
that has been described previously (Rochet-Capellan and Ostry, 
2011; Shum et al, 2011; Lametti et al, 2012; MoUaei et al, 2013). 
The microphone signal was amplified and split into two chan- 
nels: one providing an unprocessed signal and the other altered 
using a digital signal processor (DSP) to decrease the frequency 
of all vowel formants (VoiceOne, TC Helicon). The VoiceOne 
is a commercial DSP designed to alter human speech signals, 
using source-filter modeling and re-synthesis to independently 
manipulate fundamental frequency (FO) and formant frequen- 
cies (with all formant frequencies shifted proportionally). While 
the specific processing algorithm is proprietary, the magnitude 
of formant manipulations and independence of formant and FO 
changes was verified empirically by analyzing input and output 
signals. The vowel alteration was restricted to Fl by splitting both 
signals into non-overlapping low- and high-frequency compo- 
nents (Wavetek 753 low/high pass filter), and then mixing the 
low-frequency portion of the processed signal with the high- 
frequency portion of the unprocessed signal. The filter cutoff 
used to separate the two signals was set at 1100 Hz for males 
and 1350 Hz for females, each of which lies roughly half-way 
between the first and second formant values for the produc- 
tion of the vowel [e] for men and women respectively (based 
upon pilot studies). The total signal processing delay was less 
than 15 ms. 

Subjects were encouraged to maintain a constant speaking vol- 
ume throughout the task through the use of a VU meter presented 
on the computer display (showing current and peak acoustic sig- 
nal level during each trial). Subjects were instructed to maintain 
a target level on the display, which was adjusted at the begin- 
ning of the experiment to correspond to a comfortable speaking 
volume. The subject's perception of his or her own air/bone- 
conducted speech acoustic signal was reduced by mbdng the 
auditory feedback signal (presented at approximately 75 dB SPL) 
with speech-shaped masking noise (presented at approximately 
60dBSPL). 



ACOUSTIC ANALYSES 

For each word production in the baseline and speech adaptation 
tasks, a 30 ms segment centered about the midpoint of the vowel 
was selected using an interactive computer program that dis- 
played the waveform and spectrogram of each utterance, allowing 
the experimenter to identify a stable, artifact-free region near the 
vowel center. Mean Fl and F2 (second formant) frequency for 
each segment was then estimated using LPC analysis in Matlab. 
LPC parameters were chosen on a per-subject basis in order 
to minimize the occurrence of clearly spurious formant values. 
Values of Fl and F2 frequency were used to directly compare 
vowel acoustic properties during baseline productions of the 
different stimulus words among the different groups. 

Analysis of vowel acoustics during the speech adaptation task 
was restricted to Fl frequency, as the compensatory response 
was primarily observed in this acoustic parameter. During the 
adaptation task, changes in F2 under conditions of altered audi- 
tory feedback rarely exceeded 1% of baseline values, consistent 
with other reports of vowel feedback manipulations that were 
restricted to Fl (see, e.g., Purcell and Munhall, 2006; Villacorta 
et al, 2007). Following Villacorta et al. (2007) and others (Rochet- 
Capellan and Ostry, 2011; Shum et al, 2011; Lametti et al, 2012; 
MoUaei et al., 2013), changes in vowel production during the 
speech adaptation task were computed as the proportion change 
in Fl frequency relative to the mean values during the null phase 
(averaged over trials 11-30). Such normalized units (which con- 
vey changes in formant values relative to a nominal value of 1 ) are 
preferable to non-normalized units as they account for individual 
differences in baseline acoustic properties (e.g., between men and 
women). Differences in speech adaptation between the different 
groups were evaluated at three time-points: (1) the beginning of 
the Hold phase (averaged over trials 31-60) under conditions of 
altered auditory feedback, (2) the end of the Hold phase (trials 
101-130) under conditions of altered auditory feedback, and (3) 
during the After-Effect phase following removal of the feedback 
manipulation (trials 131-160). 

RESULTS 

BASELINE MEASURES 

Baseline productions of [e] -words produced in the four different 
groups were compared in order to verify that mean Fl and F2 val- 
ues were similar among the different real-word and pseudo-word 
conditions (thus yielding similar magnitudes of Fl alteration 
during the speech adaptation task). Mean and SD values of 
Fl for the two lexical-change groups (LC-1 and LC-2) and the 
two non-lexical-change control groups (NLC-1 and NLC-2) 
respectively were: 756.7(97.7), 670.0(132.4), 722.7(105.1), and 
739.3(122.4) Hz. Mean and SD values of F2 for the four groups 
respectively were: 1887.6(176.9), 1828.2(162.7), 1831.6(150.9), 
and 1828.9(195.5) Hz. A One-Way ANOVA was carried out to 
assess any differences among the four conditions in Fl and F2. 
No significant differences were found between conditions for 
either formant [Fl: f(3_ 35) = 1.12, p = 0.35; F2: _F(3^ 36) = 0.30, 
p = 0.822]. 

SPEECH ADAPTATION 

As shown in Figure 4A, subjects in both lexical-change 
groups (LC-1, producing real- words, and LC-2, producing 
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pseudo-words) exhibited a change in Fl frequency (compen- 
satory increase) in response to the auditory feedback manipu- 
lation, though with a notable difference between groups in the 
magnitude of the response. The compensatory change in Fl can 
be seen to build up throughout the hold phase and then diminish 
gradually during the after-effect phase. For the two non-lexical- 
change groups (NLC-1, producing real-words, and NLC-2, pro- 
ducing pseudo-words), the time-course of compensatory change 
in Fl is shown in Figure 5A. For both groups, a robust increase 
in Fl frequency can be seen following the onset of altered audi- 
tory feedback. In contrast with the two LC groups, however, very 
little difference in the magnitude of compensation is observed 
throughout the training. 

As a first analysis step, we evaluated for each group whether the 
maximum observed compensatory changes in formant values (at 
the end of the hold phase) were statistically reliable, using Holm- 
Bonferroni-corrected single-sample t-tests comparing formant 



values against a hypothesized mean of 1 (the value representing 
no difference from baseline in normalized units). Note that in 
each of the two lexical- change groups (LC-1 and LC-2), a sin- 
gle subject exhibited a change in Fl frequency in the direction 
opposite that of the compensatory response, as indicated by a sta- 
tistically reliable Fl decrease at the end of the hold phase (trials 
101-130) relative to baseline for those two subjects (p < 0.01). 
No subjects in the non-lexical-change groups (NLC-1 and NLC- 
2) exhibited a reliable Fl change in the negative direction. The 
presence of subjects who show such "following" responses to 
feedback perturbations has been noted in previous studies (e.g., 
Villacorta et al, 2007; MacDonald et al, 2010). In the present 
study, such subjects were excluded from subsequent analyses to 
avoid averaging across responses that were qualitatively differ- 
ent (negative vs. positive change). AH four groups were found 
to exhibit a reliable compensatory change in Fl at the end of 
the hold phase [LC-1: (8) = 2.78, p = 0.018; LC-2: f(8) = 5.30, 



g- 1.08 



1.06 

1 1.04 
o 

LL 

•O 1.02 
CD 
.N 

« 1 

E 

2 0.98 
0.96 



Change in F1 frequency 



—1 r 1 r— 



—1 1 1 1 r— 




■ PSEUDO 
■ REAL-WORDS 



1 2 3 



5 6 7 8 9 10 11 12 13 14 15 16 

Blocks of 1 0 trials 



Change in F1 frequency 





JiJ 



Early Hold Late Hold After-Effect 
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magnitude of the change in Fl is notably greater for LC-2 (blue line), 
compared with LC-1 (red line). (B). Mean normalized Fl at three key 
time-points during the adaptation task. 
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p < 0.001; NLC-1: t(9) = 8.39, p < 0.001; NLC-2: f(9) = 4.2, 
p = 0.003]. 

Mean changes in formant values relative to baseline at three 
time-points during the testing sequence are shown for the lexical- 
change groups in Figure 4B, and for the non-lexical-change 
groups in Figure 5B. To compare the magnitude of the adap- 
tation response among the two lexical-change conditions, the 
two word-production conditions, and the three time-points, an 
omnibus Three- Way mixed-factorial ANOVA was carried out 
with WORD (word vs. non-word) as one between-subjects fac- 
tor, LEXICAL (lexical change vs. non-lexical-change) as a sec- 
ond between-subjects factor, and PHASE (Early-hold, Late-hold, 
After-effect) as a within-subjects factor. The two between-subject 
main effects (WORD and LEXICAL) were found to be non- 
significant [WORD: 34) = 0.87, p = 0.36; PHASE: _F(i. 34) = 
0.22, p = 0.64], however a reliable 2-way interaction between 
WORD and LEXICAL was observed 34) = 4.36, p = 0.04]. 
A highly significant main effect of the within-subject variable 
PHASE was also observed [_F(2, 68) = 22.49, p < 0.001], but there 
was no reliable 2-way interaction between PHASE and either of 
the two group variables [PHASE x WORD: F{2, 68) = 2.52, p = 
0.09 PHASE X LEXICAL: _F{2, 68) = 1-20, p = 0.31]. The 3-way 
interaction was also found to be non-significant [F^2, 68) = 2.046, 
p = 0.14]. 

The 2-way interaction between WORD and LEXICAL condi- 
tions is of particular interest, since our main prediction involves 
a difference in the degree of adaptation between the word and 
non-word production under the lexical-change condition, with 
no such difference predicted between word groups under the non- 
lexical-change condition. Post-hoc pair-wise comparisons were 
carried out using Holm-Bonferroni-corrected f-tests to examine 
the reliability of these simple group effects. The tests were car- 
ried out on adaptation performance at the end of the training 
period (i.e., the Late-hold phase), as this represented the moment 
at which speech adaptation was maximal for all groups. A reliable 
difference in the magnitude of speech adaptation was observed 
between the groups producing real-words and pseudo-words in 
the lexical-change condition [f(i6) = 2.91, p = 0.04], while no 
significant difference was observed between the two word con- 
ditions in the non-lexical-change condition [t(i8) = 0.84, p = 
0.41]. A reliable difference was also observed between the two 
groups producing real words, one in the lexical-change condition 
and one in the non-lexical-change condition [f(i7) = 2.86, p = 
0.03]. No such difference was observed between the two groups 
producing pseudo-words [f(i7) = 1.55, p = 0.28]. 

DISCUSSION 

This study investigated the possible interaction between percep- 
tual and lexical information in guiding articulatory movements 
during spoken speech. Our hypothesis was that compensatory 
changes in speakers' articulatory patterns following an auditory- 
feedback perturbation affecting vowel quality would reflect sen- 
sitivity to the lexical status of the words produced. In particular, 
we predicted that the degree of compensation would vary when 
the auditory feedback manipulation had an effect on the lexical 
status of the word being produced (changing a real word into a 
pseudo-word, or vice versa). 



The present findings support this prediction: articulatory 
compensation to a decrease in El inducing a shift from real words 
to pseudo-words (group LC-1) was found to be significantly 
less than when the same El perturbations provoked a reverse 
shift from pseudo-words to real words (LC-2). Furthermore, the 
speech compensatory response for the LC-1 group (producing 
real-words under conditions of lexical-change) was found to be 
significantly less than that observed for the group producing real- 
words under non-lexical-change conditions (NLC-1). Crucially, 
no such difference in speech compensation was observed between 
the groups producing real-words and pseudo-words under con- 
trol conditions in which the same auditory feedback manip- 
ulation induced no lexical change (groups NLC-1 and NLC-2 
respectively). The observed difference in adaptation magnitude 
between the two LC groups can be interpreted as an extension 
of the original lexical effect on phoneme perception demon- 
strated by Ganong (1980). More precisely, this finding suggests 
that the lexical effect on perception can also be found in pro- 
duction, resulting from the fact that speakers tend to keep their 
acoustic speech outcomes within the auditory-perceptual space 
corresponding to the task-related side of the word/non-word 
boundary (real words or pseudo-words; see Figure 3). 

The conceptual and methodological payoffs of combining 
altered auditory feedback and lexical bias into a single experi- 
mental paradigm are twofold. First, evidence for lexically driven 
motor adaptation to auditory perturbations demonstrates for the 
first time a concurrent influence of phonetic and lexical informa- 
tion on the control of spoken speech, indicating that articulatory 
plasticity is in part constrained by the structure of abstract lexical 
knowledge. Second, observing a lexical effect through partici- 
pants' speech productions bypasses the methodological short- 
comings associated with explicit perceptual decision-making 
tasks, and thus strongly supports the view that the lexical influ- 
ence on perception involves a change in phonetic processing 
(e.g.. Cow et al, 2008), and is not simply the result of a bias in 
lexically-driven decision-making. 

A number of early studies of top-down or contextual influ- 
ences on phonetic perception indicated that such effects may 
emerge only in slower reaction time ranges, suggesting that the 
effect is post-perceptual and thus supporting an independent, 
input-driven phoneme perception mechanism, free of higher- 
level factors (e.g.. Fox, 1984; Connine and Clifton, 1987; Cutler 
et al, 1987; Miller and Dexter, 1988; Burton et al, 1989). While 
the present study does not allow for a precise characterization of 
the timing with which lexical knowledge exerts, or stops exerting 
its influence on speech perceptual or motor functions, the present 
results are more consistent with the view of a more interactive 
speech perceptual system, with parallel processing of numer- 
ous streams of information, permitting early integration during 
speech and language processing. This view is also supported by 
neurobiological evidence for an early interaction between word- 
form representation areas (e.g., the supramarginal gyrus) and 
lower-level perceptual cortices (e.g., the superior temporal gyrus) 
in a time range prior to that of decision-making (Gow et al., 
2008). 

In current neuro-computational models of the sensorimo- 
tor control of speech production, such as the DIVA model 
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(Guenther et al, 2006; Tourville and Guenther, 2011) or the State 
Feedback Control model of speech motor control (Houde and 
Nagarajan, 2011), changes in auditory feedback, such as those 
introduced in the present study, resuh in a mismatch between 
actual and expected auditory consequences of speech produc- 
tion. This gives rise to an auditory error signal that is used 
(to varying degrees) to directly alter the control of the oral 
motor system, as well as to update a predictive feed-forward 
control mechanism, thus improving subsequent speech motor 
plans. In the present study, the observed difference in the degree 
of sensory-motor adaptation between the two LC groups indi- 
cates an influence of lexical status on the perception of auditory 
feedback prior to the establishment of the auditory error sig- 
nal. Recent MEG studies have shown that auditory feedback 
processing, including the comparison between actual and pre- 
dicted acoustic outcomes under conditions of altered auditory 
feedback, occurs at latencies of less than 100 ms (Aliu et al., 
2009; Niziolek et al., 2013). There is also behavioral evidence 
of speech compensatory responses to unexpected auditory feed- 
back perturbations at latencies of less than 250 ms that reflect 
an influence of the phonetic boundary (Niziolek and Guenther, 
2013). Therefore, the present result suggests an influence of 
lexical status on sensorimotor function at a similarly short 
latency. 

The present findings indicate that the perceptual and motor 
sub-systems of the speech apparatus interact to a certain extent 
with higher-order lexical information, although the way in which 
this interaction takes place and at which stage of the production 
process remains to be determined. This notion is absent in current 
neurocognitive models of speech production, such as the DIVA 
model (Guenther et al, 2006). Hickok (2012) recently proposed 
a hierarchical psycholinguistic-motor control model of speech, 
whereby activation of lexical information not only constitutes the 
starting point toward speech output (in the same vein as earlier 
models of production, cf Indefrey and Levelt, 2004), but would 
also exert a possible influence on the acoustic and somatosen- 
sory feedback loops underlying spoken speech. Further work 
will hopefully yield a more detailed analysis of where and when 
this interaction takes place. Notice also that lexical-sensorimotor 
interactions may be present on the somatosensory side of speech 
production, which would be worth exploring in future studies 
using real-time somatosensory perturbations during speech (e.g., 
Tremblay et al, 2008). 
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